Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean)#1408
Open
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
Open
Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean)#1408aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
Conversation
Discriminative pre-quant AdamW TTT (per-block LR 0.3x-1.0x, 10 epochs, freeze=0) on BigramHash 3072x112 base. Builds on PR openai#1351 dTTT framework; BigramHash scaled from 2048x128 to 3072x112. 3-seed mean 1.0800 (std 0.0002), all artifacts under 16MB.
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Apr 7, 2026
…ctions - N-gram Tilt bug: PR openai#1420 kernel is non-causal; PR openai#1437 (dexhunter) found/fixed it (pre-fix 1.07807 → post-fix 1.08091). Updated primary reference to PR openai#1437 kernel. - PR openai#1423 flagged illegal (pre-quant TTT, same as openai#1351/openai#1408/openai#1416) - Added full PR openai#1421–1444 scan results - Updated best open legal PR: ~1.08091 (PR openai#1437) not 1.08014 (openai#1420) - Session 8 lessons learned added to CLAUDE.md https://claude.ai/code/session_01XLD5qpZfXpmJPnuT9kSnPC
abaybektursun
pushed a commit
to abaybektursun/parameter-golf
that referenced
this pull request
Apr 7, 2026
Comprehensive leaderboard of openai/parameter-golf record submissions compiled from open PRs. Each entry classified as valid/invalid/suspect based on source code review against PR openai#1017 validity rules. Key findings: - Best verified-valid score: 1.0800 BPB (PR openai#1408) - 3 submissions confirmed invalid (pre-quant TTT, unnormalized n-gram) - Sub-0.70 BPB submissions violate normalization requirements - 6 submissions fully code-reviewed and verified valid https://claude.ai/code/session_017F8GGeKA7MhUoQdqMGcTpg
abaybektursun
pushed a commit
to abaybektursun/parameter-golf
that referenced
this pull request
Apr 7, 2026
Deep review of train_gpt.py reveals ttt_adapt_adamw() trains on val data for 10 full epochs (TTT_EPOCHS=10, TTT_ENABLED=1 by default) before quantization. This is the same pre-quantization TTT violation as PRs openai#1423 and openai#1416 — the artifact encodes information from the entire validation set, violating strict causal dependence. The ~0.04-0.05 BPB improvement from dTTT is entirely attributable to fitting the test set. Best verified-valid score updated to 1.0801 BPB (PR openai#1420). https://claude.ai/code/session_017F8GGeKA7MhUoQdqMGcTpg
abaybektursun
pushed a commit
to abaybektursun/parameter-golf
that referenced
this pull request
Apr 7, 2026
Local copy of aamodbhatt's train_gpt.py from PR openai#1408 used during the thorough validity review that identified the pre-quant dTTT violation (10 epochs on val data). https://claude.ai/code/session_017F8GGeKA7MhUoQdqMGcTpg
taka6745
pushed a commit
to taka6745/paramgolf
that referenced
this pull request
Apr 9, 2026
Two of the three comp-frontier wins are env-var bumps with no code change: - LOOP_START 4 → 3 (with NUM_LOOPS=2 and LOOP_END=5 this gives 3-layer recurrence on layers 3/4/5 instead of 2-layer on 4/5). PR openai#1485 / openai#1471 / openai#1437 use this. Expected -0.005 to -0.01 BPB. - QK_GAIN_INIT 4 → 5. PRs openai#1413, openai#1423, openai#1485, openai#1437, openai#1351, openai#1408 are at 5; openai#1482 is at 5.25. PR openai#1477's default 4 is below the leaderboard curve. Expected -0.001 BPB. C1 (Pre-Quant AdamW TTT) is the bigger win (-0.014 BPB) but requires real code — agent is researching PR openai#1485 / openai#1416 / openai#1306 implementations in background. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record Summary
Final submitted score:
val_bpb 1.0800(std0.0002)Reference neural roundtrip:
1.09935(std0.00007)Hardware: 8×H100 SXM | Artifact: ≤15.9 MB | Training: ≤600s
What changed
3-Seed Results
Submission Checklist
records/track_10min_16mb/2026-04-06_dTTT_BH3072_11L_8xH100/Metric Verification
final_int6_sliding_window_exactin each seed logfinal_int6_roundtrip_exactin each seed logTotal submission size int6+lzmain each seed logCredits